Add TraceQL query hint to retrieve most recent results ordered by trace start time #4238

joe-elliott · 2024-10-25T19:34:34Z

What this PR does

Adds a query hint with (most_recent=true) that causes Tempo to perform a more intense search to return the most recent traces ordered by start time. This is currently implemented as a query hint b/c (depending on the circumstances) it can have a quite large impact on time to return results in both streaming grpc and discrete http paths. A TraceQL hint was chosen over a query param b/c it is desired longer term that this is the default behavior and it will be easier to remove a query hint then a query parameter.

This change is implemented at the query frontend, query engine and ingester search levels. In all cases it is required to store the most recent results and continue searching instead of immediately returning once a limit is hit.

Other Changes

Implemented the concept of a metadata response in the query pipeline and used it pass back search metadata
Refactored search, backend metadata lookup and query range functions to all use a consolidated blockMetasForSearch function. This changes the behavior of all three endpoints to search backwards in time.

Which issue(s) this PR fixes:
This PR will give a way to always return the most recent results ordered by start time which partially addresses #3777, #3109 and #2659.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>

knylander-grafana

Thank you for updating the docs! Updates look good.

mdisibio · 2024-10-28T20:17:59Z

modules/ingester/instance_search.go

 	}
 	i.headBlockMtx.RUnlock()
 	if err := anyErr.Load(); err != nil {
 		return nil, err
 	}
-	if combiner.Count() >= maxResults {


One benefit of this early return was that it didn't have to take the blocks mutex if the search was satisfied by the head block. Is that still possible if the query isn't using the mostrecent hint?

yup, that check has been moved here. for the anyCombiner the time is ignored and it's just the old limit check.

modules/frontend/combiner/search.go

modules/frontend/search_sharder.go

pkg/traceql/combine.go

mdisibio · 2024-10-31T12:59:56Z

pkg/traceql/combine.go

+
+	if c.Count() == c.keepMostRecent && c.keepMostRecent > 0 {
+		// if this is older than the oldest element, bail
+		if c.OldestTimestampNanos() > new.StartTimeUnixNano {


I have a feeling that this function is correct, but I need help understanding a bit more. Wondering if it should check the new timestamp cutoff before merging with existing? Or generally: how the timestamp/sorting works when merging. If we get an older partial snippet of an existing entry, it will overwrite the start time to be older. In that case should the result be kicked out? If the trace had been fully combined into one snippet (perfect compaction), it seems like it would have been kicked out (properly counted as older than other results).

yes, these are good callouts. i wrote this logic thinking about the combiner being used in the engine, but the truth is there is all kinds of weird behavior with fractured traces when used in the query frontend. i don't know if there are good answers for all these edge cases without reconsiderations at the storage layer. this may be a feature that is always best effort.

This is weird:

Add metadata for trace id 1234 to the most recent list

It gets pushed out by more recent traces

Receive a new shard for 1234 that re-qualifies it for the most recent list

We correctly return 1234 but have lost some information about it

Or this:

Receive 10 "most recent" traces and return a result to the client

There is a trace shard earlier in the search window which pushes one of the returned results out of the most recent list.

We have incorrectly returned a trace that should not have been in the list.

can you explain why this is the opposite of line 140? because this is saying that if the oldestTimeStamp is older than the new metadata then discard the new data.

adding to the edge case scenarios, what about traces that have root spans that not yet received that would put it outside of the search window?

can you explain why this is the opposite of line 140? because this is saying that if the oldestTimeStamp is older than the new metadata then discard the new data.

line 140 is checking if its newer than the oldest timestamp. if it is newer than it converts the spanset to metadata and attempts to add it. line 159 is check if its older than the oldest timestamp. if its older it doesn't add it. the first is a check to include the spanset. the second is a check to exclude the spanset

adding to the edge case scenarios, what about traces that have root spans that not yet received that would put it outside of the search window?

yeah. i think that's another case (similar to the second one i mentioned) where not discovered information will push a trace out of the window.

i think an exhaustive search would be required to completely remove all edge cases.

modules/frontend/search_sharder.go

Signed-off-by: Joe Elliott <[email protected]>

github-actions · 2025-01-07T00:05:15Z

This PR has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. This pull request will be closed in 15 days if there is no new activity.
Please apply keepalive label to exempt this Pull Request.

Signed-off-by: Joe Elliott <[email protected]>

modules/frontend/combiner/search.go

ie-pham · 2025-02-06T20:46:44Z

modules/frontend/combiner/search.go

+		return s.completedThroughSeconds
+	}
+
+	s.shards = shards


is there no possibility of the responses coming in out of order?

there absolutely is. in fact most of the complexity of this code is due to the fact that responses can and do come in any order. there should be only one shards response though.

these tests attempt to cover all out of order cases i could think of:

https://github.com/grafana/tempo/pull/4238/files#diff-01e87bde5e33cf0e55bb439f54ed688f4c57e4b5843c959454a9bf6760fcc97cR663

ie-pham · 2025-02-06T21:34:29Z

pkg/traceql/combine.go

+
+	if c.Count() == c.keepMostRecent && c.keepMostRecent > 0 {
+		// if this is older than the oldest element, bail
+		if c.OldestTimestampNanos() > new.StartTimeUnixNano {


can you explain why this is the opposite of line 140? because this is saying that if the oldestTimeStamp is older than the new metadata then discard the new data.

ie-pham · 2025-02-06T21:39:52Z

pkg/traceql/combine.go

+
+	if c.Count() == c.keepMostRecent && c.keepMostRecent > 0 {
+		// if this is older than the oldest element, bail
+		if c.OldestTimestampNanos() > new.StartTimeUnixNano {


adding to the edge case scenarios, what about traces that have root spans that not yet received that would put it outside of the search window?

ie-pham · 2025-02-11T22:02:45Z

I've reviewed as thoroughly as I can and nothing obvious is jumping out at me. Have you done any benchmarks on if this has any big impact to performance for the regular (non-recent) search with the new implementation? I don't believe much was changed but that would be a great assurance that nothing is broken. Otherwise, I'm good to approve.

joe-elliott · 2025-02-12T14:04:25Z

There is minimal impact to the normal search path. It does unnecessarily create shards, return and track them on the normal search path. Testing did not show any performance difference and I was hesitant to wire up all the complexity necessary to not do this work. wdyt?

edit: i added code to skip tracking of searches w/o the query hint. it's also possible to not create the objects necessary for tracking but would require a bit more rewiring.

Signed-off-by: Joe Elliott <[email protected]>

ie-pham · 2025-02-12T16:14:42Z

hit it

joe-elliott · 2025-02-12T21:33:05Z

hitting it!

joe-elliott requested review from mdisibio, mapno, yvrhdn, zalegrala, electron0zero, ie-pham, stoewer and javiermolinar as code owners October 25, 2024 19:34

joe-elliott added 4 commits October 25, 2024 15:39

Added ordered results

0a2b5c5

Signed-off-by: Joe Elliott <[email protected]>

add most_recent query hint

5c4b2dd

Signed-off-by: Joe Elliott <[email protected]>

changelog, docs and lint

67a3cbb

Signed-off-by: Joe Elliott <[email protected]>

e2e tests - fixed tag search

19c0cab

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott force-pushed the ordered-results branch from f3bc8f9 to 19c0cab Compare October 25, 2024 20:22

joe-elliott requested a review from knylander-grafana as a code owner October 25, 2024 20:22

joe-elliott added 2 commits October 25, 2024 17:15

lint

2d1c5a9

Signed-off-by: Joe Elliott <[email protected]>

remove clone changes

4bd00bf

Signed-off-by: Joe Elliott <[email protected]>

knylander-grafana approved these changes Oct 28, 2024

View reviewed changes

joe-elliott mentioned this pull request Oct 28, 2024

Query-Frontend: Perf improvements #4242

Merged

3 tasks

mdisibio reviewed Oct 31, 2024

View reviewed changes

joe-elliott added 4 commits November 6, 2024 16:59

Merge branch 'main' into ordered-results

871f75e

review

f09ea56

Signed-off-by: Joe Elliott <[email protected]>

make shards configurable

d5cc9f4

Signed-off-by: Joe Elliott <[email protected]>

dont mess with me lint. i will uninstall you

2206c89

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott requested a review from mdisibio November 7, 2024 20:31

github-actions bot added the stale Used for stale issues / PRs label Jan 7, 2025

bastischubert added keepalive Label to exempt Issues / PRs from stale workflow and removed stale Used for stale issues / PRs labels Jan 9, 2025

Merge branch 'main' into ordered-results

cd7b82c

knylander-grafana added the type/docs Improvements or additions to documentation label Jan 15, 2025

joe-elliott added 4 commits January 15, 2025 17:05

Make all endpoints search backwards in time

99f9d10

Signed-off-by: Joe Elliott <[email protected]>

Merge branch 'main' into ordered-results

eb5e4b4

nice work on this one carles

9ef6258

Signed-off-by: Joe Elliott <[email protected]>

consolidate block meta functions

4b93991

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott removed the type/docs Improvements or additions to documentation label Jan 16, 2025

joe-elliott added 3 commits February 4, 2025 09:40

Merge branch 'main' into ordered-results

c084225

fix merge :P

e5b2432

Signed-off-by: Joe Elliott <[email protected]>

remove tests

363ea56

Signed-off-by: Joe Elliott <[email protected]>

xoan-grafana assigned joe-elliott Feb 6, 2025

ie-pham reviewed Feb 6, 2025

View reviewed changes

don't bother tracking normal searches

9535709

Signed-off-by: Joe Elliott <[email protected]>

ie-pham approved these changes Feb 12, 2025

View reviewed changes

joe-elliott merged commit 6c07024 into grafana:main Feb 12, 2025
14 checks passed

joe-elliott mentioned this pull request Mar 3, 2025

BugFix: Blocks whose end second matches a query range throw an erroneous error #4783

Merged

3 tasks

github-actions bot mentioned this pull request Mar 4, 2025

[r190] BugFix: Blocks whose end second matches a query range throw an erroneous error #4789

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TraceQL query hint to retrieve most recent results ordered by trace start time #4238

Add TraceQL query hint to retrieve most recent results ordered by trace start time #4238

joe-elliott commented Oct 25, 2024 •

edited

Loading

knylander-grafana left a comment

mdisibio Oct 28, 2024

joe-elliott Nov 7, 2024

mdisibio Oct 31, 2024

joe-elliott Nov 7, 2024 •

edited

Loading

ie-pham Feb 6, 2025

ie-pham Feb 6, 2025

joe-elliott Feb 7, 2025

github-actions bot commented Jan 7, 2025

ie-pham Feb 6, 2025

joe-elliott Feb 7, 2025

ie-pham Feb 6, 2025

ie-pham Feb 6, 2025

ie-pham commented Feb 11, 2025

joe-elliott commented Feb 12, 2025 •

edited

Loading

ie-pham commented Feb 12, 2025

joe-elliott commented Feb 12, 2025

Add TraceQL query hint to retrieve most recent results ordered by trace start time #4238

Add TraceQL query hint to retrieve most recent results ordered by trace start time #4238

Conversation

joe-elliott commented Oct 25, 2024 • edited Loading

knylander-grafana left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joe-elliott Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 7, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ie-pham commented Feb 11, 2025

joe-elliott commented Feb 12, 2025 • edited Loading

ie-pham commented Feb 12, 2025

joe-elliott commented Feb 12, 2025

joe-elliott commented Oct 25, 2024 •

edited

Loading

joe-elliott Nov 7, 2024 •

edited

Loading

joe-elliott commented Feb 12, 2025 •

edited

Loading